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The main quality of deep learning over conventional machine learning (ML) 
techniques empowers firsthand uses in processing of images, speech 
recognition, medical imaging, machine translation and robotics, computer 
vision, and numerous other fields. The purpose of this study is to assess 
algorithms of deep learning for person with the disorder of autism. This 
disorder is developing disorder that causes significant communicative, social 
and behavioral difficulties in those who have it. In this research paper, the 
enhanced version of convolution network is discussed. Visual geometry group 
(VGG), is one of model of the convolution neural network which has essential 
features of convolution neural network (CNN). The VGG 16 net is employed 
to calculate the processes that can be used to classify this disorder with 
improved accuracy. The preprocessing of the image data is done. The VGG 
16 convolution network is used to classify between autism spectrum disorder 
(ASD) and non-ASD. Finally, the algorithm's efficacy is considered based on its 


accuracy performance. The VGG 16 net algorithm produces better results with 
an accuracy of 68.54%, compared with the normal CNN algorithm. 
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1. INTRODUCTION 

Autism spectrum disorder (ASD) is a neurologic condition that affects children between the ages of 6 
and 17, and it causes impairments in communication abilities and social interaction. Autism spectrum disorder 
causes patients to engage in activities that are repetitive and destroys social bonds as well as communication. 
According to the World Health Organization (WHO), one in every 160 children has autism spectrum disorder 
(ASD), and these children usually have other disorders such as depression, anxiety, and attention deficit 
hyperactivity disorder (ADHD). During childhood, early detection is critical for improving communication and 
social skills in kids with ASD and improving their value of life. An early diagnosis is critical for controlling and 
treating this condition. Autism is referred to as a "spectrum condition" because of the vast range of indicators 
that individuals encounter [1]. This ailment is a development and neurologic illness that begins in childhood, 
develops throughout the first few years of life, and persists throughout one's life. It has an emotional effect on 
how a person interacts with people, converses, and learns. This condition disrupted the communication and 
conduct of individual people. Psychologists used the “Diagnostic And Statistical Manual Of Mental Disorders 
(DSM)” published by “The American Psychiatric Association” in order to make a diagnosis for a variety of 
psychological illnesses. According to the DSM-5, there are five types of ASD, and a person can be diagnosed 
with one or more of them. In 2016, the “Diagnostic and Statistical Manual of Mental Disorders (DSM)” listed five 
major types of ASD [2]: 
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a) Mental retardation (an issue with learning). 

b) Impaired linguistic ability (postpones the acquisition of linguistic skills). 
c) Deficiency due to hereditary or lifestyle variables. 

d) A neurological, mental, or social disorder. 

e) Insomnia (strange motions). 


2. AUTISM AND CONVOLUTION NEURAL NETWORK 

Because there are no medical tests like sugar tests, blood tests, and so on, it is extremely difficult to 
diagnose ASD. The diagnosis of ASD is based on the kid's behaviour and the parent's perspective, and if the 
parent of an autistic child is unable to explain their child's symptoms appropriately, it is extremely difficult for a 
doctor to determine the correct form of autism. According to medical research, autism has no known cure, 
although Therapies and treatments are there that can help the patient's behaviour to some extent [3]-[5]. Therapy 
can also assist to ease some of their symptoms. Convolution neural network (CNN) has received very much 
interest in the arena of classification in recent years. These are solid classifiers with outstanding precisions in a 
wide range of applications and a lot of unrestricted parameters. In addition, CNN models have better accuracy 
when it comes to the extraction of features, and they are able to deal with a high number of uncontrolled 
parameters, which simply implies a huge quantity of free parameters. The convolution layers, fully connected 
layers, activation function, and pooling layers are the primary components that make up the CNN model [6]. 


3. VGGnet AND ITS ARCHITECTURE 

Visual geometry group, also known as visual geometry group (VGG), is one of model of the 
convolution neural network. It is typical deep convolution network where “deep” indicate the number of layers 
with 16 or 19 convolution layers [7]. The architecture of VGG Architecture is shown in Figure 1. Simonyan 
and Zisserman from University of Oxford proposed this architecture which is also called as VGGnet. It is really 
based on CNN's most important features. An input image of 224*224 is used in VGG-based convolution 
network. The pre-processing stage layer generates an RGB image with pixels varies from 0 to 255 to and 
deducts the mean of image values determined over detailed imagenet training set [8]—[11]. 
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Figure 1. VGG architecture 


The input visuals are processed first, then these weighted layers are applied [11], [12]. Convolution layer 
stack is used to do processing on the pictures that have been trained. There are a total of 13 convolutional layers and 
three fully linked layers in the VGG 16 architecture. Small filters of 3*3 are employed by VGG 16 to achieve 
improved depth. The VGGNet variation has a total of 19 weighted layers, including 16 convolutional, fully- 
connected layers, although it maintains the same maximum pooling of 5 layers. Both versions of the VGGNet include 
two completely connected layers, each with a total of 4096 channels. These layers are then followed by a further 
fully connected layer that consists of 1000 channels and is used to categorise 1000 labels [13]-{15]. In the very last 
completely connected layer, the softmax layer is the one that's employed for classification. VGG Architecture: 

a) Input: The VGGNet accepts images with fixed size of 224*224 with three channels (R, G, and B). The 
preprocessing is done to normalize RGB values for each pixel which is achieved by subtracting mean 
value from every pixel. 

b) Convolution layer: An image is passed through two stacks of convolution layer with size of 3*3. It is 
followed by rectified linear unit (ReLu) activations. Each layer contains 64 filters. The stride and padding 
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of convolution layer is fixed at 1 pixel. Stride is the number of the pixels over the input matrix. It is neural 
network filter which adjust the momentum of picture or the video. The output activation layer is then 
passed to spatial max pooling with size of 2*2 with stride of 2. Consequently, it reduces the image size 
from 224 *224*64 to 112*112*64. Later the activation flows though same stack of convolutional layer 
but with 128 filters instead of 64. Now the size after second stack of convolution layer becomes 
56*56*128. Next step is of third stack which of three convolutional layer and max pool layer with 256 
filters. It makes the output size of 28*28*256. Lastly two stacks of convolutional layer with filter of 512 
each is applied which makes the output size of 7*7*512 [16]-[18]. The detailed depiction convolution 
neural network configuration of the VGG net is shown in Figure 2 and Figure 3. 
Hidden layers: ReLu activation function is applied on all hidden layers in VGG network. Full form of the 
ReLu activation function is rectified linear unit activation function. Basically, it is linear function which 
will give an output if input given is positive and otherwise it produces zero as output. As mentioned, the 
convolutional layer’s stride is fixed at | pixel. 
Fully-connected layers: There are three fully connected layers in VGG. 7*7*512 is flattened into fully 
connected layers. There are 4096 channels or neurons in first two layers. Out of the three layers, the first 
two have 4096 channels each, and the third has 1000 channels, 1 for each class. 

It is proposed to use small fields to replace large one. There will be nonlinear rectification layers instead 


of one single rectification. It will help in reduction of count of parameters in order to achieve better results. 
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Figure 1. The convolution neural network configuration of the VGG net 
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Figure 2. Different versions of VGG convolution network 
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4. VARIANT OF VGG 

VGG is brought into the world to reduce the number of parameters in convolution layer and also helps 
in improving the training time. Numbers of variants are available for VGGNet like VGG16, and VGG19. that 
differ in numbers of layers in network. In this study, author has discussed five variants of VGG convolution 
network as shown in Figure 4. As it can be seen that there are two versions of VGG 16 that is C and D. Not 
much of a variation among them except that instead of 1*1, filter size of 3*3 is used for some convolution 
layers [19]-[21]. Parameters are 134 million and 138 million respectively. The major problem of VGG 16 is 
that it is slow to train and requires lot of space and bandwidth that make it inefficient. VGG 19 convolution 
network has deep 19 layers where 16 are convolution layers, 3 fully connected layers, 5 max pool and 1 softmax 
layer [22], [23]. 
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Figure 3. The convolution neural network configuration of the VGG net [24] 


5. RELATED DATA 

Facial dataset of autistic children named as autistic children facial dataset is taken from Kaggle data 
owned by Imran Khan [5], [10], [25], [26]. In present work, authors have used autistic children data set, which 
we obtained from Kaggle repository. The dataset contains 2940 photos of children's faces, with subfolders of 
autistic and non-autistic images which are distributed evenly. There is collection of photos of youngsters aged 
2 to 8 years old. The male female ratio in autistic class is 3:1 and it is 1:1 in the non-autistic class [26]. 


6. METHOD 

To classify the autistic and non-autistic child, Author has deployed VGG 16 convolution network in this 
study. The proposed frame work is depicted through Figure 5. It is employed Google Colab as an environment 
which is Google's cloud-based service and for programming language; we have used Python 3 in this study. The 
virtual tensor processing unit (TPU) from Google Colab was utilized to speed up the execution of classification 
algorithms. Following the completion of the data preparation procedure, the entire dataset is split into an 80:20 
proportion, with 80% used as a training set and the remaining 20% preserved for testing. The photos generated 
from the dataset were all of different sizes. We used python open CV's resize function to resize all of the 
photographs to the same size. After bringing all of the photos to a standard size, color space conversion was 
conducted. All of the photographs were converted from the BGR color format to grayscale. After that, the 
preparation phase was completed by turning all of the photos into arrays for future processing. 
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Figure 4. Proposed work flow of method 


7. RESULTS AND DISCUSSION 

In this day and age, for the classification of data set, there is wide use of CNN. In the proposed paper, 
authors used facial dataset of children with autistic named as autistic children facial dataset which is taken from 
Kaggle data owned by Imran Khan. It consists of 5874 images of children with subdirectory of autism and non- 
autism. During the training process, the learning rate was adjusted to 0.005, the batch size was set to 32, and the 
number of epochs was 5. The input image of 224*224 matrixes was given to network. The time of execution of 
this work was about 8 hrs and 30 minutes using the Google Colab TPU. We have attained 70.22% of accuracy. It 
is observed that better accuracy will be achieved after increasing number of epochs. In Figure 6, the intersection 
between training and validation accuracy is shown. Model loss and model accuracy is shown in Figure 7 and 
Figure 8 respectively. When applying simply the batch normalization or heavy dropout option to the extremely 
deep model, we discovered that the model gained accuracy; nevertheless, this accuracy was not very high. 
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Figure 8. Model accuracy 


CONCLUSION 
In this work, we developed a CNN model to distinguish ASD from control participants. Our suggested 


CNN architecture can provide improved classification performance with fewer parameters, reducing training 
time. As a result, our suggested model is less complicated and quicker than existing similar models. To classify 
autistic and non-autistic child, VGG net convolution network is applied on dataset of images. It has been 
demonstrated that the suggested model performs effectively, particularly with fewer pictures. It is also specified 
how the accuracy is affected by the number of epochs. It has been discovered that 25 epochs are adequate for an 
effective training of a purposed model. 
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